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eigenvectors of this matrix provide observables, and when these are plotted in the 
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I. INTRODUCTION 

In a previous publication |l|, we established the relation between phases, including 
metastable phases, and eigenvalue degeneracy, where the eigenvalues in question are in 
the spectrum of a matrix of transition probabilities. The context for this use of stochastic 
dynamics is an approach to nonequilibrium statistical mechanics based on the master equa- 
tion J2|. In further work, these phases played a role in defining coarse grains in statistical 
mechanics J3( as well as in discerning community structure in a network j^. 

In the present article we re-examine the occurrence of multiple phases. For the case 
of m+ 1 phases, there turns out to be surprising simplicity in a certain m- dimensional 
space. This is a space in which the points of the state space are given coordinate values 
corresponding to the first m observables, where by observable we mean a slow eigenvector (in 
our convention, a left eigenvector) of the transition matrix. Remarkably, although one might 
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expect no particular structure to emerge from this representation, for the case of a phase 
transition (i.e., eigenvalue degeneracy) the points form a simplex, which is to say, there is 
no more than the minimal number (m + 1) of extremal points for the convex hull of the set 
of state space points in this representation. We call this geometric structure the observable 
representation of state space, and it provides a practical method for the computation of the 
probability that an arbitrary initial state reaches one or another phase. This leads in turn to 
potential applications far removed from statistical mechanics. Thus one can have imperfectly 
defined classes of final states (i.e., they are similar but not exactly the same) for a complex 
random walk and be able to compute probabilities for arriving at each class. In fact one need 
not know the classes ahead of time. Moreover, this can be done for dynamics with relatively 
large state spaces. For a state space of cardinality N, the stochastic dynamics is generated 
by a matrix with N 2 elements — which may be daunting. Nevertheless, our method requires 
relatively little information about that matrix: the first few eigenvalues and eigenvectors. 
Hence for sparse matrices, which characterize many random walks, these quantities can be 
computed. 

A variation of the method also provides a striking diagrammatic representation of 
metastable phases, particularly when there is a hierarchical tree-like structure, as occurs 
in spin glasses. One can see shorter and shorter lived (metastable) phases peel away (in 
reversed time) from those that are closer to the root of the tree. In previous work J5( we 
explored models of this sort, but in the present article there is fuller understanding and 
exploitation of the observables. We have also studied other features of the transition matrix 
spectrum, for example situations where the eigenvalues do not drop abruptly as the index 
increases. This should enhance the utility of this work in the spin glass context p . 

This article has two principal sections. In Sec. |H] we develop the mathematical basis 
for the assertions just made, and in Sec. II I II we provide examples in which those assertions 
are realized. Because of the density of mathematical estimates, we begin Sec. [H] with an 
overview, which should allow an understanding of the examples without having to go through 
too many details. The remainder of that section is devoted to those details. Finally in Sec. 
II VI we discuss problems that may benefit from this treatment as well as mathematical issues, 
such as whether certain of our hypotheses might be weakened. 

II. PHASE TRANSITIONS AND EXTREMALS IN THE SPACE OF 

OBSERVABLES 

A. Overview 

The states of the system we study are given by x, y G X, and the system moves from 
state to state in discrete time according to transition probabilities given by a matrix R. For 
convenience in dealing with the eigenfunctions of R we define it as follows 

R xy = "Pr(x <— y)" = Pr [State at time (t + 1) is x | State at time t is y] . (1) 

For the processes we study there is a unique stationary distribution, Po(x), which satisfies 
Pq = Rpo, with the p on the right of R. For this eigenvalue of R, X = 1, the left eigenvector 
is simply A (x) = 1, corresponding to the conservation of probability, i.e., Eipx ^ — 1j 
because from a state y you have to go somewhere. 

The eigenvalues of R fall on or inside the unit circle p, |8( and we order them by decreasing 
magnitude: Aq = 1 > |Ai| > IA2I > • • •• The corresponding right and left eigenvectors are 



respectively p k and A k , and satisfy 

Rpk = hPk , A k R = \ k A k , k = 0, 1, . . . . (2) 

Our story begins when several of i?'s eigenvalues are nearly degenerate with Ao = 1. 
As we showed in jl|, this heralds a phase transition in the system, enabling the realization 
of an old dream relating eigenvalue degeneracy and phase transitions [9(]. Our method of 
proof involved those left eigenvectors of R that correspond to the slowest eigenvalues (those 
nearest to 1). We now find that not only was this convenient for the proof, but it also 
provides a graphic illustration of phase structure along with the possibility of computing 
auxiliary quantities such as asymptotic probabilities and time dependence. 

We suppose then that m of the eigenvalues (after Ao) are very close to 1 and that A m+ i 
is not. We focus on the left eigenvectors of R. If X has N states, then we can form an 
m by N array of quantities, A k (x), with k = 1, . . . ,m, and x G X. Think of the m-tuple 
(Ai(x), . . . ,A m (x)) as an m-vector, with one such vector for each x G X. All N of these 
can be plotted in R m and we surround this set of points with a minimal convex surface, 
the convex hull. In general such a surface can have many extremal points. We will show, 
however, that because of the eigenvalue conditions, the convex hull of this particular set of 
points has (essentially) just m + 1 extrema, around each of which many, many points of X 
may cluster. These extremal points are what correspond to the phases. To see what they 
look like in a typical case, with m — 2, see Fig. |21 This is a plot of A\(x) versus A 2 (x). The 
vertices of the triangle that you see are actually composed of many points (shown in more 
detail in Fig. BJ). 

There is a straightforward intuitive way to understand this bunching. The phases are 
in a sense dynamically far from one another. That is, if you start in one phase you expect 
to stay there for a long while before going to any other phase. This means that there is a 
restricted dynamics within that phase that nearly conserves probability. The bunching of 
points in one phase (as we define it) means that for all points in that phase A k (x) has very 
nearly the same value (for every k — 1, . . . , m). Let's see why that happens. Consider the 
eigenvalue equation for A k applied t times, where t is small enough so A| is still close to one, 
so close that we will now treat it as unity. Then 

A k (y)*Y,Mx)Rly (3) 

X 

Now restrict x and y to be such that R x is not small, so we would say x and y are in the 
same phase. Then for this restriction of R, Eq. (J3J) still holds, and we appear to have a 
number of eigenvectors of eigenvalue close unity. But we also have 1 ~ ^2 X R xy , because 
little probability escapes the phase. This last expression says that a constant on the phase 
plays the role of Aq for the restricted time evolution. If we now make the further assumption 
that relaxation within the phase is relatively rapid, then other eigenvalues of the restricted R 
are significantly smaller than 1, and all the apparent eigenvectors A k , as well as the constant 
pseudo-A . must in fact be proportional to one another. In other words, the A k s (k < m) 
are constant on the phases. 

The actual proof proceeds a bit differently. Its heart is Eq. (|21|) . which is essentially a 
statement of the eigenvector property of the A's. In this equation, we write p y (x) in place 
of R xy , since (as just observed) the latter is the probability that starting from y you reach 
x in t time steps. 



B. Additional properties of the stochastic matrix 

The matrix R is assumed to be irreducible. This implies that the eigenvalue 1 is unique 
and that its eigenvector, p (the stationary distribution), is strictly positive: 

Ai ^ 1 and ^ R x y Po{y) = Po(x) > , Vx . (4) 

ydX 

Since no detailed balance assumptions are made for R, it need not be diagonalizable nor 
have a spectral representation in terms of eigenvectors. Nevertheless, we will assume that 
for the eigenvalues that concern us (those near 1) each eigenvalue possesses one or more 
eigenvectors. The orthonormality condition for the eigenvectors, (Ak\pe) = Ske, still leaves 
a single multiplicative factor for each pair (Ak,Pk)- The stationary state po is naturally 
normalized by Y^Po( x ) = 1> which fixes A (x) = 1 for all x. For the other A's, consistent 
with v4o, we normalize by the condition 

max |Afc(x)| = 1 , Vfc . (5) 

X 

Our principal assumption is that for some integer m, Ai, A2, • • • , A m are real and close to 
Ao = 1. Specifically, this closeness is taken to mean that there exists a range of t (integer 
times) such that for some e<C 1 

1-A| = 0(e), \<k<m. (6) 

In most of our development we further assume that the |Afc| for k > m are much smaller 
than A m , that is, 

|Afe|«l, k>m + l. (7) 

Remark: If R has eigenvalues near —1 (or other roots of unity), our arguments go through 
for R 2 (or other appropriate power of R, provided that that power is not so high that Eq. (jBJ) 
is violated). See |l0(, Sec. 8. 

The spectral decomposition of R is written 

m 

R l = \ Po )(A \ + J2^\Pk)(M + |AUi|£ (<) (8) 

fc=i 

with 

5(i) = EAlftH4l (9) 

rr \A m +i\ 

k>m 

jllj l . We assume that A^ +1 5 is uniformly small in the sense that for any subset Y C X and 
any x E X 

iaUiIEi 5 £i = ^)«°( 1 )- ( 10 ) 

y€Y 

Remark: Instead of the stochastic matrix "i?" we sometimes use an alternative matrix, 
"W," which can be thought of as the generator of continuous time stochastic evolution. 
Schematically W = (R — 1)/At. W will be called a stochastic generator. In fact it generates 
the usual Master Equation. The constraints on W are that its off-diagonal matrix elements 



must be non-negative and that its column sums must vanish. Its spectrum consists of 
and points to the left of the y-axis in the complex plane. Eigenf unctions are unchanged 
from the R representation (since the matrices differ only by a multiple of the identity). The 
advantage of using W is that in producing randomly generated matrices one need not be 
concerned that the column sum of off-diagonal terms be less than one. 

C. The phases 

Under the foregoing hypotheses and under a separation hypothesis "<S" to be intro- 
duced below, we will construct m + 1 subsets, ultimately to be identified as the phases, 
X^, . . . , x( m+1 \ of X with the following properties 

1. The sets, X (1) , . . . ,X (m+1) , are disjoint. 

2. On each X^\ the Ak are nearly constant, for 1 < £ < m + 1 and 1 < k < m. 

3. The complementary subset of U™!^ X^ has small p -weight. Specifically 

E Mv)>l-™-0{ri), (11) 

for a constant, 5 such that e/5 is small (and e = 1 — Aj„). 

4. The X^\ . . . ) X (m+1 - ) are essentially unique in a sense to be described below. 

Remark: Our phases are a bit larger than what are conventionally called phases and include 
states that rapidly transit to the usual phases. See Sec. Ill HI 

D. Proof of the existence of the phases 

For any pair x, y G X, define 

P t y {x)=R t xy . (12) 

Py(x) is the probability that a system in state y at time is in x at time t. 

Let m < N — 1. Consider the following geometric construction in R m : for any y G X, 
form the vector A(y) = (Ai(y), . . . , A m (y)) G R m . This gives a set A of N vectors in R m . 
Let A be the convex hull of A. 
The first remark is that the vector = (0, 0, . . . , 0) is in A. 

This follows from the orthogonality relation, (Ak\po) = 0, for k > 1. Thus 

J2po(y)A(y) = 0, (13) 

y ex 

so that is a convex combination of the vectors of A. 

As a consequence, one can find m + 1 points y|, 1 < £ < m + 1, such that the vectors 

E e = A(y* e ) (14) 
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are extremal points of A, and such that is a convex combination of them [12J, |l3| . There 
may be several ways to choose these points y\, but we shall prove that, in fact, the resulting 
vectors Et (1 < £ < m + 1) are uniquely defined up to a small ambiguity to be stated later. 
By the selection of the Et, we can find m, 1 < £ < m + 1, with < \it < 1, J2T=i A^ = 1 
such that 

En e E t = 0. (15) 

We have found that there are subsets of these points that are separated from one another 
in a particular way, and we add to our assumptions the following "separation" hypothesis 
("<S"): 
Hypothesis S: For each £ let 

4> e = min \\E e -E k \\, (16) 

k=l,...,m, kyt£ 



and define 



$ = min Tr ^ 7T . (17) 



Then our hypothesis is that the extrema y\ can be selected so that (in addition to Eq. ([To])) 
they satisfy the following: 

1-A^ = e«$<0(l). (18) 

(for an appropriate range of t). 

We next observe that by definition 

AU*(K?) = Wn) , (19) 

so that for all £ 

{\\AM), KMvl), ■■■, ^ m A m (yd) = ^pl^Aiy) (20) 

y€X 

Because < A^ < 1, k < m, the vector on the left side of Eq. (j20j) is in the convex set 
A; moreover, its distance from the extremal vector Eg = (Ai(y%), . . . ,A m (y^)) is less than 
1 — X^. Since Y^yV^ijj) — 1> we have from Eq. (|20|l 

E t - {\\AM)A\AM),---A t m A m {yD) 

= Y,pltM( E t- A (y)) (2i) 

y 

The idea of the next few steps is as follows. For the case m — 1 the above equation 
immediately yields the desired result. On the left you have something that is very small. 
On the right a sum of products, each factor of which is positive. Therefore one or the 
other of these factors must be small. This means that if y can be reached from y* (with 
moderate probability), then A\{y) cannot be much different from A\(y*) (in this case E\ is 
just Ai(y*)). For m > 1 the positivity of (Et — A(y)) is not manifest, so that it is useful 
to change coordinates and take as origin the vertex of the cone based on Eg. In this way all 
distances away from the vertex are positive in an appropriate coordinate system. Eq. (|21|) 
then yields the near constancy of the A's on the phase. 



We now pick a particular £, and consider the translated set Ei — A in R m , i.e., the set 
of vectors {Eg — A(y) \ y G X}. Then, the vector is extremal for the convex set Eg — A. 
We prove the following lemma: 

Lemma: Consider a finite set of vectors w(y) G R m , y G X, such that is an extremal 
point of the convex hull of those vectors. Let w G R m be a vector in the convex hull of the 
{w{y)}. Thus 

w = ^2q{y)w{y), (22) 

ydX 

with < g(y) < 1, ^2q(y) = 1. We further assume that ||iUo|| < P for some positive p, 

where the norm is the maximum norm, as in Eq. (jHJ). Then, 

(i) There exist m independent linear forms on R m , hi,...,h m , such that hj(w(y)) > 0, 

hj(w ) < p. Moreover, hj(w(y)) = 0, Vj, if and only if y is in the nonempty set, {y}, 

associated with the extremal point 0. These forms can be scaled so as to give the same 

distances that we now use in R m . 

(ii) Let a be a strictly positive real number. Define 

X(a) = {y G X | hj{w{y)) < a for all j = 1, . . . , m} . (23) 

Then 

E^)<v- (24) 

I/*X(o) 

Proof: Assertion (i) comes from the fact that zero is an extremal point of the convex hull of 
the vectors w(y) for y G X. The linear forms are essentially a local coordinate system with 
the extremal at the vertex. Positive coordinates, not necessarily orthogonal can be defined 
for this cone. Since the scale of this coordinate system in general includes an arbitrary 
multiplicative constant, it can be taken to be the same as that of the original R m . Therefore 
the coordinates of Wq in this system remain of order p. 
Assertion (ii): One has, by the hypotheses of the lemma, 

p > hj(w ) = Y^ q{y)hj{w{y)) > Yl aq ^ • 

y&X { y eX\h :j (w(y))>a} 



Thus, 



£*)<£ E *>h a ■ (25) 

V*X{a) 3=1 \{»eX|fci(t»(l0)>a} / 

This proves the lemma. 

Remark: In practice, the "m" appearing in Eq. (|25J) may be a severe overestimate. This 

is because points in other phases will generally exceed a for all components of the linear 

forms. 

Application of the lemma: For every £, 1 < £ < m + 1, we apply the lemma to the set of 
vectors and numbers 

w(y) = Et- A(y) (26) 

Wo = E t - (XlAM), XlA 2 (y* e ), . . . , X^A^yl)) (27) 

q(y) = Pl*(y) (28) 

P= (1-A^)||^||. (29) 



These vectors and numbers satisfy the hypotheses of the lemma, because is extremal for 
Eg — A (see Eq. (I2T|) ). ||u>o|| < 1 — Kni an d ll-^ll — 1- By the lemma, we can define for every 
£, 1 < £ <m + l, m independent linear forms ht,j, j = 1, . . . , m, such that if we define 

Xt(at) = {y e X | h t j{E t - A(y)) < a e , j = 1, . . . , m} , (30) 

for a positive real number, ae, then from Eq. (|3T)|) and the lemma 

E PfeM > ! - ^ I 1 - A U H^ll = 1 " T (1 " ^) > ( 31 ) 

where 5g = ai/\\Eg\\. 

We now make use of hypothesis S. Because e -C $ we can find Sg such that e <C 8i <C $. 
This allows the probability of remaining within a phase to be large (as in Eq. (|31|) ). while 
maintaining near constancy of the A k s on that phase. Specifically, from Eq. (f3T)|) we have 

\A k (x) -A k (x')\ < at = 8t\\E t \\ < ®\\E t \\ < $ < 0(1), withx,x e X {£) . (32) 

We next establish that these phases nearly exhaust X in the sense of the probability 
measure p Q . The spectral decomposition of R gives 

m 

Pltiv) = Kvt = Po(v) + E KPk{y)A k {yl) + \\Ui\ B y]i (33) 

fe=i 

But 

E^^ = 0, (34) 

with < /if < 1, J^™^ /i£ = 1. By the definition of Eg this means 

m+l 

E^fc(%*) = °» !<*<"*■ (35) 

e=i 

We next sum Eq. (J3~3*|) for each £ with coefficient fig to give 

m+l m+l 

E ^pJj(J') = Po(y) + o + |A^ +1 | E M B ylt = Mv) + o(v) ■ (36) 

e=i e=i 

Now sum over y G UfcXfc(a), i.e., all y in the phases, and interchange sums, 

m+l 

X> E Pfe(y)= E Po(y) + O(i7). (37) 

«=i yeu fc x fc (a fc ) yeu fc x fe (a fc ) 

Then we deduce 

E *t:<y)= E pl*Sy)+ E <*(^ E pfeM^-y/ 1 -^). 

y£U k X k (a k ) yeX t (ai) yeU k7 t e X k (a k ) y£X e (a e ) 

(38) 
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where the last step uses Eq. (J3TJ). By Eq. (|S7|) and Eq. (|S5|l (and using 5 = mm 5 



e 



E Po(y) = I> E pJfG/) + °fo) ( 39 ) 

3/euXfe <? yeux fc 

> Z>£ !&(!/) + Ofa) (40) 

because ^2(i£ = 1. Therefore by Eq. (J25J) 

1 - 
5 



E Po(y) > 1 - y (1 - A^) + 0(17) . (42) 



j/eU/.X'/fa/) 

This proves statement (jlljl with X^ = X^a^). 
On each phase X^\ one has by definition, 

< h t j (A(y* e ) - A{y)) < a t j = 1, . . . , m . (43) 

This implies that the coordinates of the vector A(y^) — A(y), for y E X^\ are O(a^) for 
fc = 1, . . . , m, because the h^j are linearly independent forms. This establishes Eq. (jlUJl . 

Remark: If "m" is large (for example where many metastable phases are present) the factor 
me may not be smaller than one. In that case there may not be the clean separation of 
phases that we discuss here. 

E. Proof of the uniqueness of the phases 

We start from Eq. (0) for A k and R*, 

\ k A k = A k R l , 1 < k < m . (44) 

The right hand side of this equation can be split by summing over each phase X^ and over 
the set of points of X outside of any phase: 

m+l / \ 

KMv) = E E M*)Ky + E M*)Ky (45) 

We next estimate each sum in Eq. ([45)1 . One has |Afc(a?)| < 1, by normalization. Moreover, 

E R *v ^ r1 -^ + -°to) > ( 46 ) 

x^VJtXW l 

where K is 0(1), as will be shown below in Sec. Ill Ot and in particular Eq. (j61j) . Choose a 
point x^ E X^ for 1 < £ < m + 1. One has 

E ^(zX, = ^(* W ) E R *v+ E (^W-^(x W )X (47) 

rtXW xdXW x&XW 
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The last sum in Eq. (|37jl is O(a^) Y^xexW ^xy> usrn g Eq. (|52|). which is in turn O(a^), since 
XLexM -^xy < 1- Therefore Eq. (|4*5j) becomes 

4b(*) = E ( YT E < | M^) + 0{at) + 0(77) , (48) 

<=i V k ^ x(e) J 

with a = maxi<K m+ i a^ <C 0(1). Moreover, X\ = 1 + 0(e), so that finally 



m+l 



Mv) = E fcfo)^^) + °fa) + °( a ) ( 49 ) 



£=1 

with 



q e (y) = J2 R % ■ ( 5 °) 

xexW 

) is the probability that, starting from y E X, the system is in phase X^ at time t > 1 
14j . The system of Eq. (}4T?j) is a vector equation 



m+l 



A(y) = E «e(y) A (* W ) + 0(«) ( 51 ) 



Moreover, one has 



m+l m+l 



E^) = EE^ = 1 " E < = l + 0(e) (52) 

because of Eq. (J4TIJ) . 

Thus Eq. (JoTj) says that the vector A(y) for any y is in the convex hull generated by 
the m + l vectors A (a;^), up to 0(a) corrections. Therefore, up to 0(a), there can be no 
extremal points except those already among the {x^}, since otherwise Eq. ()51|) would be 
an expression for one extremal point in terms of the others. This implies that the phases 
are unique (up to 0(a)). 

F. Barycentric coordinates 

Eq. (J51J1 says that the vector A(y) has barycentric coordinates qt{y) defined by Eq. (|5U|) . 
with respect to the vectors A (x^) in each phase X^\ where it does not matter which 
x^ £ X^' is chosen, up to errors of order a. Moreover, by Eq. (|50|1. qe(y) is the probability 
that, starting from y, the state of the system is in phase X^' at time t (where t is the time 
scale used to define the phases). 

But this means that one can calculate these probabilities qe(y) in a geometric manner. 
We need to calculate the first m left eigenvectors Ai(y), . . . ,A m (y) (after the trivial A ). 
This is sufficient to define the phases by our construction. Using these phases the qg(y) are 
the barycentric coordinates of A with respect to the phases. Thus, the spectral properties 
and convex hull construction provide a way to calculate the probability of reaching classes 
of intermediate asymptotic (time-scale t) states, that is to say, the phases. 
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G. Weight estimates inside and outside the phases 

We wish to establish Eq. pS|). 

1. For each k, 1 < k < m + l, the following relation is satisfied by the points inside the 
phase 

E^)^ 1 " 1 ^ (53) 

[cf. Eq. (EH)]. 

2. We now consider arbitrary x (not necessarily in one of the phases). By the spectral 
decomposition 

j%.(x) = p Q (x) + J2 ^Mx)Myl) + Kn+iB®* k , 1 < k < m + 1 . (54) 

With the notation 

a ni = A n _i(yl) , p' k = p fc _i , Pj(x) = p\* (x) , (55) 

Eq. (J53J) becomes 

m+l 

Pk{x) = ^2^ t (x)a tk + 0(e) + 0(r l ), l<k<m + l. (56) 



This is a linear system oi m + l equations for the Pk{x), and as a consequence, one can solve 
for those quantities: 

m+l 

p' k (x) = J2 Pi{x) c ek + 0(e) + 0( V ) (57) 

e=i 

with c = a~ l . That is, the first m+l right eigenvectors are expressible in terms of the 
distributions within each phase. This is the generalization of Eq. (3.7) of |l|. 

Remark: From this relation we can also see that each p k (of the first m+l) right eigenvectors 
is proportional to p on each phase; that is, Pk(x) ~ const • Po(x) for {x} that constitute 
a single phase. First recall that each Py*(x) looks locally like p on its phase, because t is 
assumed large enough that local equilibration is complete (that's the smallness of A^ +1 5). 
On the other hand, outside phase-£ each Py*(x) is zero, since starting from the extremal on 
a time scale such that X^ is still close to unity there is little escape. Therefore, by Eq. (|57|). 
each p k (or p' k ) is simply given as a sum of such p\.*(x) (or P^(x)). 

3. Next take the points x outside all phases and write again, for any y< (inside or outside 
of any phase) 

m 

pl f (x)=p (x)+Y,* t Mx)Mrf)+>L + iB®i- (58) 

1=1 
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In Eq. (|55jl replace the pfc(x)'s by their values given in Eq. (|S7|) 

m m+1 

pJtW = E E *WvW,j(s) + AU^ (i) , (59) 

fc=0 £=1 

where C® is a combination of the various B® and is assumed to be bounded. But from 

Eq. dnu 

E ^)<y(i-^), (60) 

so from Eq. (J5TJJ) 

(x) < K^^ + 0(r/) , (61) 



E pJti-<-- , 

where K depends on the cm and thus only on the geometry of the A (yl) and is therefore 
0(1). Therefore at time t, the probability of starting from y' (for any y') to be outside of 
all phases is small, when t is such that 



1 — ^m 



5 
4- We know that 



«0(1), and |A^ +1 ||B (t) | <0(1). (62) 



E Po(x)<j(l-X t m )+O(r 1 ). (63) 

For this last estimate, the left hand side does not depend on t, so that the right hand side 
can be taken a value of t such that Eq. (|62|) is valid. 



H. Basins of attraction 

As noted above, our phases are not only the usual states in a phase, but also include 
the short-time-scale basins of attraction for those phases. In this context "short-time-scale" 
means 0(?y), that is, at worst, the next slower mode after X m . A particular example of this 
can occur when the extremum point itself is not in what one usually calls the phase but only 
gets there in 0(r]). In this behavior, the extremal is like the points that are not uniquely 
identified with any phase and have non-zero probabilities for going to several. (See below, 
Fig. |21 for an example of this.) The only difference between these intermediate points and 
a not-in-the-phase extremal is that the barycentric coordinates for such an extremal have a 
single 1, with the other entries zeros. The distinction between basin-of- attraction states and 
those conventionally assigned to the phase will lie in the size dependence of po(x) — which 
assumes that one is in a conventional context, where a thermodynamic limit is contemplated. 

An illustration of an extremal that does not lie in the usual phase can be found in our 
article on the definition of coarse grains, J3|. There we analyzed a 1-dimensional Ising 
model. Although there is no conventional phase transition in this system, there is a marked 
difference in system behavior for temperature above and below the value T = 1 (in the units 
we use there) even for moderate values of the number of spins on the ring. In that article we 
plotted the left eigenvector, Ai(x), as a function of magnetization. That is, x = (01, . . . , <Jn) 
is a spin configuration, with each a^ = ±1, and the magnetization is J2k a k- Two plots, 
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FIG. 1: Ising model stochastic dynamics. A\ versus magnetization. On the left is shown T > 1, 
on the right T < 1. In this figure an L 2 normalization is used for A^ (see J5|). 



one below T = 1 and one above are shown in Fig. ^ In [3J| we made the point that the 
magnetization emerges from the slow eigenvalues (A near 1) in a natural way. Surprisingly 
(to us) this turned out to be more marked above T = 1 than below. The close relation of 
A\ to magnetization is evident in the T > 1 figure. Below T = 1 there was a bunching 
of values near the maximum (and symmetrically about the minimum) |l£| . The maximum 
of Ai, which we use to label the phase, occurs for the state with maximum magnetization, 
which, using analyticity-related definitions, is not part of the phase in the thermodynamic 
limit [16J. Nevertheless, this maximum of A\ (for T < 1) differs little from A\{x) for other 
points, x. 



I. Detailed balance for the principal portions of R l 

As above, assume that the first m + 1 eigenvectors of R are real and that in the spectral 
expansion the rest of R is small. Then R itself nearly satisfies detailed balance. That is, for 
the truncated R, J xy = -R^pn(y) — R^xPo^) is small. 

The truncated spectral expansion for R is (cf. Eq. (J3~3*j0 R^ xy = Po(x)+J2T=i K;Pk( x )Ak{y), 
from which we have 



J 



■>-u 



RiyPoiv) - R. 



"xyi 



,'/.'' 



p (x) 



Y^ A ^ lPk( x ) A k(y)po(y) - Pk(y)A k (x)p (x)} . (64) 



k=i 



Case 1: x and y are in the same phase. Up to 0(7/), A k (y) = A k (x), so that every term in 
the sum contains differences, Pk(x)p (y) — Pk(y)Po(x). However, by the Remark following 
Eq. (JoTj) . on any particular phase Pk{x) is proportional to pn^)? so these differences are zero. 
Case 2: they are not in the same phase: then i?*^, which is Py(x), is close to zero (on the 
order of e). Therefore J xy is also zero. 

The foregoing observation must be used with caution. It only asserts that J xy is small 
on scale of r\ or of e. However, for times, r, such that 1 — \ T m is not small, this conclusion 
does not hold. 
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Remark: Although detailed balance implies that all eigenvalues of R are real, the converse 
is not true. (Thus the mere fact that all eigenvalues in the truncated R are real does not 
already imply detailed balance.) From numerical exploration we have indeed found that 
there is a correlation between J2 \^xy\ anc ^ S l^fcl (f° r matrices generated in a certain 
random way); nevertheless it does happen that Yl l^fcl = 0, while £V \Jxy\ is n °t J17I ] . 

J. A reduced stochastic process 

Until now we have focused on a time scale "£" such that 1 — X^ <C 1. Now take a 
time, T, such that this is not the case. For such T, we can drop 0(rj) and 0(e) terms in 
our representation of R, but nevertheless need to retain 1 — A^. This also implies that for 
k — 1, . . . ,m, Ak(y) can be replaced by Ak(y*) for y e X^\ Furthermore, for these longer 
times, points not in any phase can be dropped, since they have long before made their way 
to one or another phase. 

Start with the standard spectral representation (dropping 0(77) and 0(e)) 

m 

R ly = E X kPk(x)A k (yi) , for y e X® . (65) 

fc=0 



Take x E X^ and do a standard coarse graining [3j]: 



m+l 



«(*. i) = E *sM = E «•. E »<(*) E ««tM • < 66 > 

xexW, y exU) ^ yj) e=i \xexW ) \yexU) ^ yj ' J 

where /i(j) = X/xexM-Po^)- (N.B. yu(fc), the measure, and /i^, the barycentric coordinate, 
are not the same.) Recall from the observation following Eq. (joTjl that p' n (x) is proportional 
to Po(x) in each phase. Define the proportionality constant by pi(x) = PekPo(x) ( or Pe( x ) — 
PekPo( x ))- Doing the sums, Eq. (JoT)Jl becomes 

m+l , ( -\\ rn+1 

R(kJ) = J2 X J-iPekK k ) \ a ^ljj)) = Yl tf-iP'ikVityuij , (67) 

The fact that Y^- R(k,j) = 1 follows immediately by summing over x in Eq. (|65|) . 

The stochastic matrix R therefore describes the transitions between phases on a much- 
elongated time scale. 

III. ILLUSTRATIONS 

In this section we illustrate the general principles with specific numerical examples. 

A. Multiple phases with relatively rapid internal relaxation 

In Fig. |21 we show a three phase situation. 
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For simplicity we work with the matrix "W" , discussed in the Remark just before Sec. Ill Cl 
The form of the matrix corresponding to this figure is schematically 



W 



[W 1 e e a\ 

e Wi e a 

e e W3 a 

\ e e e 0/ 



+ e • random , W = W - diag (Y^ W ) ■ ( 68 ) 



Thus 3 random matrices are produced and weakly coupled to one-another: "e" is a generic 
small matrix and need not be the same matrix for each appearance in W. Then additional 
states are added to the state space (those appearing after W 3 ). These have large one-way 
couplings to the other states ( "a" , a generic not-small matrix which again need not be the 
same in each of its appearances in W) plus small probabilities of return. Next small tran- 
sition probabilities are added to be sure the matrix is ergodic. Finally the actual stochastic 
generating matrix, W, is computed from W by forming column sums and subtracting those 
sums on the diagonal (where for A an n x n matrix, J^ A is the n-component object Y2 X ^xy 
and "diag" is a diagonal nxn matrix with its argument on the diagonal and zeros elsewhere). 

Because this is a three-phase system (by construction) it is sufficient to take A(x) to be 2- 
dimensional, i.e., we plot only A\(x) versus ^(x) for all 16I. This is Fig. El The vertices 
of the triangle consist of the large number of points (dimensions of the spaces associated 
with Wk, k = 1,2,3), while near the middle of the triangle are the points associated with 
the additional dimensions in the lower right hand corner of W. Because the form of a was 
approximately the same for all of them, they are near one another. If any of these points 
is expressed in barycentric coordinates with respect to the vertices, the coefficients give the 
probability that starting from this point one reaches the respective phase (vertex). To see 
that the vertices are actually blurred a bit, we plot in Fig.|3]a close-up of one of the vertices. 
This is the same matrix as in Fig. |21 

In the next figures we do the same exercise but with four phases, that is, Eq. (|68|) is 
modified by putting in a fourth block, W4. Fig. 0] shows the real part of the spectrum of 
W. The spectrum is similar to the 3-phase case, and as can be seen from the numbers, one 
does not require extremes of magnitude, large or small, to get useful information from the 
geometrical construction. By construction, the U W" of Fig. H] has three eigenvalues near the 
stationary one (0), leading to four phases. Fig. |S] shows the convex hull of the points A(x). 
If one cuts off the plot in too low a dimension one gets what is seen in Fig. |H1 Here only A\ 
and A2 are plotted and as can be seen there are four rather than three extrema. This is an 
illustration our need to have A m+ i much smaller than those preceding it. If this condition 
is not satisfied, more extrema appear. This is what Sec. Ill El was all about. 

B. Hierarchical phases, no sharp cutoff in eigenvalue; simplified spin glasses 

For two classes of phenomena we do not expect the eigenvalues to drop off suddenly, as 
discussed in connection with first order phase transitions. For spin glasses there is expected 
to be a hierarchical sequence of metastable states. For critical points the eigenvalues should 
have a power law dropoff near the stationary state. 

For hierarchical structures, already studied by us in [5J], we do a variant of the geometrical 
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Two left eigenvectors 
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FIG. 2: Plot using the first two left eigenvectors {A\ and A 2 ) of the transition matrix, R, for a 
three-phase system. A circle is placed at each point (Ai(x), ^(x)) for each of the N states, x, in 
X. The lines connecting the circles are for visualization. The matrix R is generated by combining 
4 blocks, 3 of which are random matrices, the fourth essentially zero. Then a bit of noise is 
added throughout, with bigger terms for migration out of the fourth block. Finally the diagonal 
is adjusted to make the matrix stochastic. This leads to a pair of eigenvalues near one. This 
plot using the first two eigenvectors shows the extremal points to be clustering in three regions, 
corresponding to the phases. The points not at the extremals represent the fourth block, all of 
which head toward one or another phase under the dynamics. For the particular matrix chosen, 
they are about as likely to end in one phase as another. For all eigenvector plots, the quantities 
plotted are pure numbers whose scale is set by our normalization convention, discussed in Sec. Ill Bl 



construction just displayed. The overall W matrix has the following form 
U i e Wi e I , with each W k of the form W k = 




W\ 5 

5 w 2 



(69) 



and e<(5<l. For this structure it is instructive to introduce time into the picture. The 
vectors to be plotted are A^'(x) = (A 1 (x)X\, . . . , A m (x)X t m ). We have built the hierarchy 
to have 6 phases; on a medium time scale three pairs of them decay into a common branch, 
subsequent to which the three branches merge into a single trunk. Since we cannot image 
the 5-dimensional structure, we take the projection of this motion (as a function of time) on 
a particular plane. This is shown in Fig. [7[ where the circles represent the original phases 
and the 'x' is the final state, (0,0). 
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Two left eigenvectors: Detail 
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FIG. 3: Detail of the upper left vertex in Fig. [21 In actuality the points in each phase cluster 
closely together and more than one extremal might, in principle, occur. The precision is limited 
by the non-negligible magnitudes of the quantities 1 — A2 and A3. 
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FIG. 4: The first few eigenvalues of W for the four phase system. For those eigenvalues having an 
imaginary part (which is not the case for the first four), only the real part is shown. 

C. Asymptotic probabilities 

Consider a random walk on the landscape shown in Fig. |H| The stationary state is shown 
in Fig. El There are clearly 4 regions of attraction, which we identify as the "phases" 
discussed in this article. The spectrum of the (225 by 225) generator of the stochastic 
dynamics is [0,exp(— 16.0), exp(— 15.3), exp(— 14.8), exp(+l. 2), . . .], so that this satisfies the 
conditions for having 4 well-demarcated phases, which in this case represent regions of 



18 




-0.6 -0-4 



FIG. 5: (Color online) Convex hull of the set of points A(y) for y £ X. This is for a case of 4 
phases and the figure formed in B? is a tetrahedron. 





Two left eigenvectors 
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FIG. 6: (Color online) For the 4-phase case, if one plots only in 2 dimensions one does not see only 
3 extremal points. The fourth apparently sticks out of the triangle formed by the others (although 
it did not have to) and in a third dimension actually forms a node of a tetrahedron (Fig. [5J . 
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FIG. 7: Phases at successively later times for a hierarchical stochastic matrix. As explained in 
the text this is a two-dimensional projection of the five dimensional plot of eigenvectors multiplied 
by eigenvalue to the power t. On the shortest time scale there are 6 metastable phases (circles); 
subsequently they merge into three and finally into a single stationary state. The axes repre- 
sent particular linear combinations of eigenvectors and lack physical dimensions (but have a scale 
determined by the normalization of Sec. Ill B|) . 



attraction. Finally in Fig. El we show how the methods of this article can be used to 
calculate the probability that from a given initial condition one will arrive at one or another 
asymptotic state. Each circle in the graph (which is a 3-dimensional plot of a tetrahedron; 
cf. Fig. EJ) represents a point on the 15 by 15 lattice and its location in the plot, when 
expressed in barycentric coordinates (positive numbers that add to 1) with respect to the 
extremals, gives its probability of reaching a particular phase. In the graph we do not 
identify the particular circles, but the same computer program that generated the graph can 
easily provide a table of probabilities for each initial condition. 



IV. PROSPECTS 



The transition matrix for a stochastic process gives rise to observables, namely its slowest 
left eigenvectors (in our convention R xy = Pr (x <— y)). For each x in the state space 
of the process one can form a vector (Ai(x), . . . ,A m (x)) for integer m, with A\~ the left 
eigenvectors. Depending on the spectrum of R, the space of these vectors can provide a 
graphic demonstration of the phases (in the sense of phase transitions) of this process. We 
call a plot of the points of the state space using a collection of slow (left) eigenvectors an 
observable-representation of state space. 

We have shown how when there is a hierarchical structure of phases that structure be- 
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FIG. 8: (Color online) Landscape for a random walk on a 15 by 15 lattice. The distance unit on 
the lattice is arbitrary and the scale of the potential chosen so as to give a dynamical spectrum 
illustrating our representation. 
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FIG. 9: Probability distribution for the stationary state of a walk on the landscape shown in Fig- (HI 
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FIG. 10: Observable representation, in R 3 , of the states for the walk on the landscape shown in 
Fig. |HJ Each circle represents a point on the 15 by 15 lattice and its position within the tetrahedron 
(when expressed in barycentric coordinates with respect to the extremals) gives the probability of 
starting at that point and arriving at one or another extremal. The plot is very much like that 
shown in Fig. [5J but includes interior points. (Fig. [S] shows only the convex hull.) 



comes manifest in the space of observables. Our model for demonstrating this is artificially 
constructed, but we expect that for systems of greater intrinsic interest the same features 
seen here should emerge. Thus, spin glass models could be considered, for example the 
Sherrington-Kirkpatrick model. Even for local spin glasses, although the state space grows 
large quite rapidly, our method requires little information from the transition matrix, R. For 
example, a 2-dimensional 4 by 4 spin glass would involve a 2 16 by 2 16 matrix (2 16 = 65 536), 
but it is a sparse matrix, and all we would want to know would be the first few eigenvectors, 
which is quite feasible. Now 4x4 may not be much of a lattice, but the knowledge gained 
from the corresponding observables would immediately give information for the longest pos- 
sible time scales. For the mean field Sherrington-Kirkpatrick model one should be able to 
do even better. 

In the more traditional arena of stochastic processes, our geometric construction allows 
one to read off the probabilities of an initial point reaching any of various asymptotic states, 
even when one does not have prior knowledge of what those states are. We gave a simple 
example of a random walk on a multi-well landscape, but other examples easily come to 
mind. 

At the mathematical level, we believe our assumptions are stronger than they need to 
be. It is likely that hypothesis S can be replaced by something weaker. We already have 
preliminary results on this point in low dimension. Another place where we have an implicit 
assumption (although we did not emphasize it at the time) is in the proof of assertion (1) 
of the lemma, where we make a generic assumption about the geometry of the linear forms: 
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FIG. 11: Observable representation, in R 2 , of the states for Brownian motion. The circle is for the 
one dimensional case of a walk on a ring. The rectangular figure is for a two-dimensional random 
walk with non-periodic boundary conditions. 



specifically that angles in the effective coordinate system are not such that large values of h 
could be generated from small distances. 

Cases where one has eigenvalues near one but there is not a sharp dropoff after one 
particular eigenvalue are of great importance in physical applications. Certainly for spin 
glasses, although they are expected to show the hierarchical structure discussed above, they 
would have a collection of time scales of decreasing size (local relaxation times) with no 
cutoff at a particular value. Also of interest is the case of critical phenomena. Here, absent 
hierarchical structure, we do not expect a small number of extrema to dominate. One also 
has other properties, for example the divergence of spatial correlation lengths, that on the 
face of it do not appear to be directly related to the dynamics. Nevertheless, as shown 
in [3j], much of this structure can be recovered from the eigenvectors, so that a dynamical 
characterization of this kind of transition, as we have done here for first order transitions, 
may well be possible. 

Nevertheless, even where there is no phase transition, the observable- representation can 
provide an image of the state space. For the case of Brownian motion on a ring, the 
transition matrix simply has constants just off the diagonal, as well as in the corners to 
provide periodicity. We showed in [3( how R can recover spatial structure, but a plot of 
Ai(x) vs. A 2 (x), as in Fig. ^J is even more direct. As can be seen, this immediately gives 
the coordinate space ring. For two dimensions we show (in the same figure) a slight variant 
in which we have reflecting rather than periodic boundary conditions. 
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